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Abstract 

The aims of this paper are to propose, construct and analyse microscopic kinetic models for 
the emergence of long chains of RNA from monomeric /3-D-ribonucleotide precursors in prebiotic 
circumstances. Our theory starts out from similar but more general chemical assumptions to those 
of Eigen jjj], namely that catalytic replication can lead to a large population of long chains. In 
particular, our models incorporate the possibility of (i) direct chain growth, (ii) template-assisted 
synthesis and (iii) catalysis by RNA replicase ribozymes, all with varying degrees of efficiency. 
However, in our models the reaction mechanisms are kept 'open'; we do not assume the existence 
of closed hypercycles which sustain a population of long chains. Rather it is the feasibility of 
the initial emergence of a self-sustaining set of RNA chains from monomeric nucleotides which is 
our prime concern. Moreover, we confront directly the central nonlinear features of the problem, 
which have often been overlooked in previous studies. Our detailed microscopic kinetic models 
lead to kinetic equations which are generalisations of the Becker-Doring system for the step-wise 
growth of clusters or polymer chains; they lie within a general theoretical framework which has 
recently been successfully applied to a wide range of complex chemical problems. In fact, the 
most accurate model we consider has Becker-Doring aggregation terms, together with a general 
Smoluchowski fragmentation term to model the competing hydrolysis of RNA polymer chains. We 
conclude that, starting from reasonable initial conditions of monomeric nucleotide concentrations 
within a prebiotic soup and in an acceptable timescale, it is possible for a self-replicating subset 
of polyribonucleotide chains to be selected, while less efficient replicators become extinct. 



* corresponding authors 
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1 Introduction 



Since the early 1970s, an impressive body of theoretical and experimental work has been accumulating 
which supports the so-called "RNA world" view 0, ||. According to this picture, the central dogma 
of molecular biology, that "DNA makes RNA makes protein" , is replaced with the view that "in the 
beginning" both the genetic material and the (enzymatic) catalysts were comprised of RNA. Altman 
and Cech won the 1989 Nobel prize in chemistry for demonstrating that RNA does indeed have catalytic 
powers, and that there are naturally occurring enzymes made of RNA, now known as ribozymes [|J. 
According to this standpoint, the problem of the evolution of life thus divides into two parts: (i) where 
did the RNA world come from, and (ii) how did that develop into the world we know today involving 
DNA, RNA and proteins? Both of these questions are formidable, but whereas the second is widely 
appreciated, the first is more easily overlooked ||. 

This paper aims to address the first of these two questions. When the issue is considered carefully, 
it is apparent that it is by no means straight forward to provide a convincing answer. On the one hand, 
there is a lack of evidence for the spontaneous formation of the /?-D-ribonucleotide monomers that 
comprise RNA under plausible prebiotic conditions (the pyrimidine-containing compounds in particular 
have failed to appear in all experiments to date); it remains unclear how nucleotides of the right chirality 
appeared and what led to stereospecific 3' — 5' polymerization; and finally no feasible prebiotic RNA 
replicator has yet been found. However, progress is being made in this direction; for example, it is 
known that chemically-activated nucleotides can form oligomeric chains of 20-40 nucleotides in length 
in the absence of templates pj. It seems at least reasonable to be optimistic on this score and to assume 
that in the not too distant future truly autocatalytic RNA molecules will be made. 

On the other hand, due to the distinct information content carried by different nucleotide base 
sequences within RNA polymers, the problem has a combinatorial complexity which puts it into the 
class of NP hard problems in the sense of algorithmic complexity theory [|J. In particular, because 
there are four bases within RNA (A, G, C and U), a linear RNA polymer of length N monomers 
can exist as 4^ chains with different base sequences. Not only does this preclude numerical analysis 
of the problem for anything other than very small values of N, it has been used prima facie as an 
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implausibility argument against the very possibility of producing self-replicating RNA molecules from 
a prebiotic soup comprised initially of nucleotide monomers. According to this argument, to produce 
a significant concentration of any one self-replicating RNA molecule-assuming this to be chemically 
feasible-would have required a greater mass of molecules in the prebiotic soup than is available from 
the mass of the Earth M. Such an argument may be rendered invalid if the kinetics of the system are 
highly nonlinear, as indeed they are in the models we present here. 

Putting aside the several outstanding chemical questions concerning the realization of actual ex- 
amples of self-replicating RNAs on the assumption that these can and will eventually be achieved, 
the aim of the present paper is to present and analyse kinetic models of RNA chain formation and 
self-replication in prebiotic conditions. The central purpose of this work is to assess the feasibility 
of achieving viable concentrations of self-replicating RNA polymers from plausible estimates of the 
physicochemical parameters and conditions which most likely pertained under prebiotic conditions. 
(Although for the purposes of the study presented in this article, only the RNA worldview is consid- 
ered, we wish to point out that there is by no means a consensus on this question; for an overview of 
different standpoints see Fleischaker et al. ||.) 

One part of our paper is concerned with proposing detailed microscopic kinetic models of RNA 
formation and self-replication; these models are loosely based on what we refer to as the so-called 
"Becker-Doring" assumptions of classical nucleation theory. The major extensions and generalisations 
of this theory which are needed to faithfully model the complexity of the current situation lead to equa- 
tions which, as they stand, are far too complex to analyse theoretically or numerically. Moreover, the 
chemistry which these rate processes describe is far too detailed to be of direct use to experimentalists. 
Hence we formulate a "coarse-graining" reduction aimed at overcoming the combinatorial complex- 
ity inherent in these systems, and which simplifies the kinetic equations so that useful analysis and 
comparisons with experiment are possible. This reduction procedure is a major theme of the present 
paper. 



In some theoretical respects our work is close to that of Eigen jjj ^, [K], [TT], [L2| and Nuno et al. 



13 , 14, 15, 16, O, Q8| in that we are modelling the formation of RNA as a complex sequence of chemical 



reactions which proceed via catalytic and non-catalytic pathways. The main differences between our 
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approach and that of both Nuno and Eigen are that our reaction networks are all open-ended rather 
than closed cycles ( "hypercycles" ) , and that we address the full nonlinearity of the kinetic equations 
involved. 

Moreover, from a chemical perspective our work is mainly aimed at answering a significantly different 
question to that posed by Eigen and Nuno et al. Their work explains how large concentrations of long 
chains cooperate in order to maintain unexpectedly high concentrations when one might expect only 
very few, if any at all, to be present. By contrast, our work deals with an earlier stage of the process, 
namely how long chains are formed from a system which initially has only the basic building blocks 
(monomers) present. The temporal behaviour of our models is, in fact, very similar to that shown in 
figure 16 of Eigen Q. 

Because the full problem we are addressing is mathematically very complex, we approach it in a 
sequence of stages within the different sections of the paper as now outlined. In Section 0, a highly 
simplified but instructive heuristic model is described and analysed. This idealised "toy" model shows 
how the various essential rate processes involved, when combined together, produce qualitatively the 
type of behaviour we would hope to see in the more realistic and much more complex models of RNA 
formation and self-replication we later construct. 

The remainder of the paper presents and analyses various more detailed models, based on the 
Becker-Doring equations for step-wise growth. The main point to make here is that Sections ||| and |] 
are concerned respectively with detailed and coarse-grained models of RNA chain formation and growth 
in the absence of hydrolysis (which breaks up growing chains), whereas Sections |5| and |^ include this 
important process and demonstrate the dramatic effect it has on maximal chain length expected. We 
then investigate under what circumstances some species become extinct and others dominate. 

The models derived in Sections |3| and [5] are vastly more complex than the original Becker-Doring 



equations [19], [2(|-and indeed even than the generalizations we have previously developed to describe 
micelle and vesicle self- reproduction |21], |22[ , as well as generalised nucleation processes [^]-since in the 
present work we attempt to keep track of the information contained in the RNA sequences by noting the 
order of addition of nucleotides. This then enables us to incorporate the effects of template-assisted and 
ribozymic catalysis of chain formation and replication. The paper closes with a discussion in Section [7] 
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while an Appendix describes a model similar to the one developed in Section |3| but incorporating more 
complex kinetic processes. 



2 A simple heuristic model 

In this section we present a very simple heuristic model of RNA formation and self-replication which 
contains the basic and essential properties of slow growth by an uncatalysed reaction mechanism, fast 
growth due to catalysis and error-prone replication, and we show that it produces results consistent 
with our intuition - namely a long induction time, after which large concentrations of products are 
formed. 

We assume there is a steady source of nucleotides from which to build chains; then, with no catalysis 
and no errors during chain polymerization (growth), we expect an exponential growth in concentration 
(u) as described by 

du 

Now we shall allow u to depend also on the composition of the growing RNA chains (that is, on the 
sequence of nucleotide bases of which they are comprised), which we denote x. Errors in replication 
of x lead to the creation of chains with similar composition, and are thus close to x in this "sequence 
space" ; this can be thought of as a diffusive process in sequence space. Thus we propose the following 
partial differential equation 

— = DV 2 x) u + u + au 2 . (2.2) 

On the right hand side, the DV 2 ^u term describes the diffusive spread of material from one type 
of composition to another via imperfect replication, while the term au 2 models catalytic replication, 
whereby if some species (x) exists, its rate of production will depend on its own concentration u(x, t). 
Equation ( |2.2| ) has the form of a non-linear reaction-diffusion equation; indeed, equations of this general 
structure are well known and their properties have been much studied in the context of a mathematical 



theory of explosive chemical reactions (thermochemical runaway); see, for example, work by Dold |24 

Our first step is to show that there is a growing solution which is independent of the information 
in the chains, u = f(t), independent of x. From equation ( |2.2j ), this amounts to solving the ordinary 



differential equation (ODE) 

f = f + af } (2.3) 

which has a solution f(t) = foe 1 / (1 + afo(l — e')). This expression for f(t) blows up in a finite time, 
at t c = log(l + l/a/o). 

Now we show that the behaviour predicted by this solution is unstable, in the sense that a pertur- 
bation of the solution which depends on the information variable x (i.e., on the nucleotide sequence of 
the chain) has an even faster growth rate. We put u(pc,t) = f(t) + h(x,t), with h -C /. We linearise 
about f(t) and find that h(x,t) satisfies 

- = DVl )h + h + i + J a{i _ ety (2.4) 

which can be solved by separation of variables, h(x.,t) = X(pc)T(t): 

T'(t) 2af e t = ^ x) X(x) = _ 

T(t) i + a / (i_ e t) X(x) ^ { -> 

In the simplest case, where we take a one-dimensional information space x = x G R 1 , the shape in 



x-space of the variation /i(x, t) is sin(xy ji/ ' D) , cos(xy fi/D), indicating an increase in the concentration 
for some x values and a decrease in others (relative to the average concentration f(t)). In fact, this 
formulation is valid whatever dimension the space of all x's is taken to have. The temporal evolution 
of the perturbation is unaffected by X(x), and is given by 

m/ , £(l + a/ ) 2 e (Wf)t , i 

T « - (l +a /„(l -e<y < 2 ' 6 > 

This function increases more rapidly than / for t sufficiently close to t c (if // > 1 + 2a /o then /i at first 
decreases and then increases, blowing up at t = t c ). 

In summary, the solution to the heuristic equation ( |2.2|) we have derived is 



u(x ' t »~l + a/o(l-e') + d + a/„(l-e W ' (27) 

which, for times near blow-up (t « t c = log(l + l/a/o)), shows that a system with modulations in 
composition (x) has faster growth rates than a system without such dependency, indicating that some 
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species will be preferred over others on the basis of their specific chemical composition (information 
content). 

Although the very simple model presented and analysed in this section displays some rather un- 
physical behaviour-such as the presence of a singularity corresponding to the development of infinite 
concentrations of RNA chains in finite time-the very fact that it is possible to produce very high 
concentrations of products starting from arbitrarily low concentrations of reactants is encouraging for 
our quest to account for the origin of the RNA world. Our next step is to construct and analyse the 
dynamical behaviour of more realistic, and of course much more complicated, models of RNA forma- 
tion and self-replication. It is possible to tame the singularities which develop in these more realistic 
models; as we shall demonstrate, one means of achieving this is by incorporating hydrolysis into such 
schemes, which causes the breakdown of longer chains (Sections |5| and 0). 

3 Models of RNA polymerization in the absence of hydrolysis 

Our aim now is to begin to model the chemical processes involved in the formation and replication of 
RNA. We will denote the four nucleotides bases A, C, G, U by iVj with i = 1,2,3,4; and oligomeric 
ribonucleotide sequences by C7, where r signifies the number of bases in the sequence and 7 denotes 
the particular order in which they occur. 

The main reactions that such chains can undergo are: 

(i) the basic Becker-Doring rate processes controlling chain growth 



(ii) template-based chain synthesis (a form of catalysis mediated by Watson-Crick base-pairing of 
ribonucleotides on complementary chains) 



Some nucleotide sequences are better 'replicators' (templates) than others, by virtue of base- 
pairing effects. In particular we expect that the complementary chain to 7 (in the sense of 




c; + n, + c\ 
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Watson-Crick hydrogen-bonded base pairs), which we shall denote 7*, will be an extremely 
effective template for the production of 7. 



(iii) 'poisoning' or inhibition, for example by tightly-bound duplex formation 

C? + C° a # Pg. (3.3) 

In the present paper, we have not made any serious attempts to incorporate inhibition into our 
detailed models. 

(iv) hydrolysis - whereby a long chain is split into two shorter chains. Chemically this corresponds to 
the reaction C7+ s — > C r 7 + Cf . This has the form of a general fragmentation process as modelled 
by the Smoluchowski equations |2f|, a mechanism which increases the number of chains but 
reduces the average chain length. 

(v) enzymatic replication (replicase ribozymal activity) - where a third chain aids the growth of a 

chain which is already in close contact with another chain acting as a template 

CI + N l + Cr + t e * + Cl # Cffij* + C£+* + Cf ; (3.4) 

here the combination of 7 with Ni is a subsequence of the chain 7 + 9 (that is 7 + N{ C 7 + 9) 
while Cf plays the part of a replicase ribozyme. Needless to say, some of these replicases will 
have much higher efficiency than (most) of the others. 

We note in passing that, according to Eigen et al. the only efficient self-replicators have r rs 70 



(longer chains have copying errors too high to guarantee accurate replication). Our model ultimately 
confirms that there is a critical length, beyond which chains cannot self-replicate with sufficient accuracy 
to remain viable. The manner in which this critical length depends on the self-replication error rate, 



the uncatalysed growth rates and autocatalytic rates is given in Section |672 . 

We use C7 to denote the chain 7 which has length r, and other chains which it may interact with 
will be denoted Cf, whose sequence of bases is encoded in 9 and has length s. We define a flux 
corresponding to the addition of each type of monomer ({Ni}f =1 = {A, C, G, U}) to each type of chain 



S 



7- Since there are two ends to each chain there are two growth points. The reactions are given by 



C7 + Ni # C#f « , and iV< + C? # C r+ , 



(3.5) 



The rates of these reactions will include effects arising from replicase ribozymal activity, as shown 
explicitly in the flux functions J^' N below. The superscripts carry information on the exact sequences 
of bases, and of course the order of the bases is important-iVi followed by 7 is a different sequence to 
7 followed by N\. 

We start the mathematical modelling of these processes by considering only the reversible aggrega- 
tion mechanisms. These all fall into the class of Becker-Doring processes of stepwise cluster growth. In 
the first part of this paper, we shall neglect the hydrolysis and the poisoning mechanisms in order to 
investigate in detail the primary growth mechanisms that such an already formidably complex system 
allows. Later on, we shall return to study the effect of hydrolytic fragmentation mechanisms (Sections |5| 
and|6|), postponing further consideration of inhibition for a subsequent publication. 

The flux Jp> N d denotes the rate of attachment of nucleotide monomer (base) Ni to the right hand 
end of 7. Similarly Jl Nin ^ denotes the rate of attachment of monomer iVj to the left hand end of 7. The 
processes described above mention three mechanisms for growth: an uncatalysed chain growth step, 
two template-based growth steps which lead to quadratic catalysis and a ribozymal replicase process 
which is cubic in the reactant concentrations. We take template-based chain growth as being error- 
prone in that it is most effective when 9 = 7*, but it will also affect the overall rate of aggregation 
when 9 7^ 7*: 



J, 



r h)J N i. 



-M -[Ni] 

-h+m C r+l 

c r+l 



C r ' c l [Ni+y] 
c r+l 



r,s,k c s c r+k 



£: 

7+JVjC£ 



J 

(3.6) 

\ 



-r,0 



-T+l 



C Q 'c 



\p ANi,vM]J6)JZ] 

/ t / 1 ^r,s,k 

JVi+ 7 C£ 



k+r 



J 



(3.7) 



We use 7; to denote the left hand end monomer of the chain 7, and 7 r for the right end monomer; 
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7 — 7r denotes the chain 7 without the final (7,.) monomer, and —7/ + 7 the chain without its first 
element. When this information is included in the notation of a concentration or a rate constant, we 
shall place it as a superscript in square brackets to avoid any possible confusion with exponents. Using 
this notation, the kinetic equation for the concentration of the chain 7 is then 

4 4 



cj?] = J^p+7l +J [7^,7r]_^^iV ij7 ]_^ J J 7 ,Ar i ] (3 _ g) 

1=1 1=1 



(the apparent duplication of terms being due to the assumption that the model must be able to 
distinguish which end of the chain a monomer is added to). 

These equations introduce the poisoned duplices P 7 / whose concentrations pj 7 /' satisfy the equation 

Aifi] _ tM r [il M _ -Jri A JnA (q q) 

In constructing these equations, we have assumed the existence of a unique equilibrium solution for 
the cj7''s, written ej. . Provided this exists, an equilibrium solution for the poisoned duplices pj 7 /' can 
be found trivially from ( |3.9|) 



mi's 

The other main assumption we have made here is that all four types of monomer are held at a constant 
concentration, the so-called pool chemical approximation familiar in chemical kinetics. 

In the remainder of this paper, we shall ignore poisoning altogether; this can be thought of as 
equivalent to setting all the coefficients fcj 7 /', raj 7 /' equal to zero. As written above, these models are 
extremely complex and indeed represent problems that are of NP type in that their algorithmic 'size' is 
an exponential function of the chain length; this length may itself become unbounded as the reactions 
proceed. Such intractability means that any analysis must be performed on drastic simplifications of 
these kinetic equations. 

Even these kinetic equations are of course idealised and no experimental systems have yet been 
identified which exactly realise the steps we are proposing. For example, such polymerisations normally 
need activating ester linkages; and the polycondensation of mononucleotides diluted in water yields 



10 



water as a byproduct-thus reducing the concentration of all species. However it is quite feasible that 
other, more appropriate chemistries will be discovered in the not too distant future. 

We shall now analyse two simplified models, one of which exhibits only autocatalysis on the micro- 
scopic scale, while the other is fully crosscatalytic. 

To simplify the models we shall assume that there is a significant autocatalytic mechanism (specifi- 
cally we assume that the concentration of a chain cM is approximately equal to that of its complement 
cj. 7 *'), that all chains have the same small crosscatalytic effect, and that the role played by the repli- 
case ribozymes is both crosscatalytic and autocatalytic. Thus we shall replace the vast number of 
possible but generally unknown rate coefficients in (|3.6|) - (|3.7|) by just four rate constants e, a, x, C> an 
approximation which simplifies the flux functions occurring in those equations to 

(_r 7 | —[Ni] \ / \ 

- l^rc^j [e + ac^ +xE4* ] + tc^^c^ . (3.11) 

3.1 Analysis of RNA formation for the case of pure autocatalysis 

Next we neglect the breakup of chains, so that ( |3.1| ) and (|3.2|) are treated as irreversible; we also 
ignore the replicase mechanism and template crosscatalysis (that is, we neglect error-prone template 
synthesis). The resulting equations are made manageable if we analyse a case of this model in which 
chains catalyse only their own production-thus a chain cj.+i 3 can be constructed in two ways: 

O r -|- Ly^ f O r ^j , Vj t ~T O-^ -|- O r _|_j ' iO r ^j , ^O. ±Z J 

where a and e denote rate coefficients. By generalising a to a + j3r (r being the length of the chain), 
we can consider a system where longer chains can have greater catalytic effect than short chains. The 
fluxes are then defined by 

J^i = c yN, ( e + ( a + (3r)6lX^) , J?*" = c?cf (e + (a + (3r)c^V) , (3.13) 
with kinetics determined by 

ci = + j;™- - £ J? Ni - £ J?*' 1 - ( 3 - 14 ) 

i=i i=i 
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We study the system of equations in the pool chemical approximation by assuming a constant concen- 
tration of monomers, so that the total matter density will not be preserved. We now proceed to solve 
the Becker-Doring equations and show that growth of long chains can indeed occur. 

We seek a solution assuming that the rate coefficients are of magnitudes e C 1, a,/3 = 0{1) (that 
is, the uncatalysed rate is much smaller than the template catalysed rate) and with concentrations 
depending only on chain length (r). From equations (|3.13|) and ( |3.14|) , we find that 



Strictly speaking, since the equations model irreversible chain growth, the system cannot be described 
as having an equilibrium solution in the thermodynamic sense; instead, it approaches a steady state 
solution. In the case e — > 0, this is 



Increasing the catalytic effect of longer chains produces a more rapidly decaying distribution of long 
chains. This is perhaps slightly counter-intuitive: we have made it easier to make long chains and yet 
find fewer of them; the apparent paradox is resolved by noting that long chains are less likely to 'hang 
around' as they are busy forming even longer chains. In more realistic models there would be fewer 
long chains due to the larger errors encountered in replicating long chains, not to mention the effect of 
hydrolysis. 

3.2 Analysis of error-prone RNA formation 

The above analysis showed, using a very simple model, that autocatalysis alone could produce a 
population of long chains. We now turn to crosscatalysis to see if this also aids the production of 
longer chains. Here we are obliged to analyse a more complex model, where all chains act as catalysts 
for growth. In what follows, we shall analyse the simplest case, for which the rate constants are 
independent of chain length; in Appendix |A| we analyse a more complex case where the rate constants 
are proportional to chain length (and for which more singular behaviour is found) . 



8c r ci (e + (a+/3r)c r+ i) + 2c 1 c r ^ 1 (e + (a+/3(r - l))c r ) . 



(3.15) 



2 1 -rc l T{\{r + a/(3))T{\{2 + a/{3)) 
r(±(r+l+e*//3)) r(i(l+a//3)) 



(3.16) 
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In this case the rate coefficients are given by a%'% = 1, aj' = e, so that the fluxes can be written as 



^• 7 = ^ = ^( e +E4) 



(3.17) 



in place of (3.13). We continue to use equation ( |3.14 ) to relate these fluxes to the rate of change of 
concentration. We shall use f r = Y. 1 :\ 1 \=r c r ^ where the notation |7| means the length of the chain 7. 
The total monomer concentration of the monomer species is denoted fi = Y^t=i c i i an d is held fixed to 



the same constant for each of the four types of base (c 1 i = ~/i Vz). 



The form of fluxes (|3.17| ) implicitly assumes that all the rate coefficients are independent of 7, Ni, 6; 
thus we look for solutions which only depend on the length of the chain and are independent of its 
exact composition. We then obtain the simpler equation 



s=0 



fr = Vlfr-l [e + Y, fsj - 2/l/r U + £ /- 

To solve this type of equation, we define the generating function 

F(z,t) = Y,fr(t)e- rz , 



(3.18) 



(3.19) 



r=l 



which, from eqn( |3.18|) , satisfies the partial differential equation 

dF(z,t) 



Ot 



2/ 1 [e + F(0,t)] he-*-{l-e- z )F(z,t) 



(3.20) 



To solve eqn ( |3.20| ) we first find F(0,t) by letting z — > in equation ( p.20| ) to give 4:F(0,t) 



1fl[e + F(0,t)} whose solution is 

F(0,i) = (/ 1 + e)e 2 ^-e. 
Now that -F(0, t) is known, the differential equation (|3.20|) can be solved to yield 

' he-* ' 



F(z,t) 



1-e- 



1 _ exp _ (i _ e -)(i + e//l) (eVtt _ !)} 



(3.21) 



(3.22) 



Ideally we would proceed to find f r (t) from this by forming the Taylor series in z, but it turns out 
that there is no simple formula for the r th derivative. Instead we interpret the chain length distribution 
function f r (t)/ F(0,t) as a time-dependent probability density function; it is then straightforward to 
find how the expected chain length and standard deviation vary with time. Firstly, the total number 
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of chains N(t) = F(0,t) and so is given by ( |3.21| ). The total number of chains increases exponentially 



starting from f\ at t = 0. The expected length of chains grows with the same exponent: 



vm -PL — — d — 

K ' " N(i) F(0,t) <9z 



(A + e ) e <"? + 2(/ 2 - 6 2 )e 2t A 2 - A 2 -2/ 1 6 + 6 2 ^ 



2/i((/i + e)^-c) 

Finally the standard deviation (a = y/V) starts from zero and also grows with the same exponent, so 
that in the large-time limit the standard deviation has the same order of magnitude as the mean chain 
length: 



^ (r 2 ) (r) 2 I «9 2 F 



N N 2 F(0,t) dz 2 



dF 



2=0 



F(0,t) 92 



2 = 0/ 



~ T2 - l ) (f tSl + 7 + e ~ 2tfl + 3e " 4t/l2 ) + ( e )- ( 3 - 24 ) 

None of these quantities blows up in finite time, but they all tend to infinity as t — > oo. The standard 



deviation (y V(t)) and the mean (E(£)) both tend to infinity exponentially with the same exponent. 
Thus after a large time there will be a wide variety of chain lengths present in the system. 

3.3 Discussion 

The model analysed in this section allows for the growth of long chains. We have shown that these kinds 
of Becker-Doring based models can exhibit a wide variety of behaviours, from the purely autocatalytic 



model of Section |3.1| which approaches a steady-state solution, through the steady slow growth in 
length and number of chains which occurs in the purely crosscatalytic template-based growth model 
of Section 3.2 , to the singular behaviour described in Appendix [A] for an alternative crosscatalytic 



template-based growth model where rate constants are proportional to chain length. In this last case, 
the number of chains and expected chain length both blow up in finite time. This behaviour would be 
moderated in the presence of hydrolysis (see Sections || and 

In the next section, we describe a way of analysing the behaviour of replicating RNA systems which 
allows different chain compositions of equal length to have varying concentrations. This will enable us 
to investigate whether some species (that is, RNA sequences) dominate over others and the possibility 
that certain species may become extinct. 
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4 Coarse-grained reductions of microscopic models 



In the previous section we have considered two massive simplifications to the model proposed in Section 
|H neither of which was capable of describing the effects of the enzymatic reaction mechanism. So far, 
all the solutions we have managed to find have been for situations where all chains of the same length 
have the same concentration regardless of their composition. These solutions have shown the dramatic 
effects, both qualitative and quantitative, which template-based and ribozymic catalysis can have on 
the kinetics of growth. 

In this section we will perform an alternative simplification which enables us to construct solutions 
which allow different chain compositions to have different concentrations. The main difference between 
the analysis of the previous two sections and this one is that here we will initially be returning to 
reversible reactions (although later we shall again impose irreversibility). We thus return to equations 
(|3.6|) - (|3.8|) , although as noted earlier we shall ignore the inhibition steps. 



4.1 Contraction 



The equations we start with are ( |3 . 6|) — ( p78|) . Our aim is to perform a coarse-grained rescaling to reduce 
the number of equations in the system in the same way as has been performed in micelle formation 
|]21j ], vesicle formation and generalised nucleation theory |E3|. The same procedure applied here 
to equations ( p.6| )-( ^8|) yields 
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Note that the Becker-Doring structure present in the original system of rate equations is not 
destroyed by this approximation [BTJ, |22|, [E3J. Here uj is an arbitrary sequence of length A, 7* 



represents the sequence of the first A nucleotide monomers of 7, and Y r -x the 1 &S ^ ^- 

It is inevitable that this coarse-grained contraction procedure hides some of the subtleties of no- 
tation: for example, the term x\ in the contracted flux does not determine the order in which the 
nucleotides were added-there is no longer a unique form for 7. Thus the contraction procedure inher- 
ently performs a limited averaging in 7-space (sequence space). 

As an a posteriori check that these procedures are sensible, we average this system over all possible 
chain compositions (a "7-average" of the system (|4.1| )- (|4.3| )). This gives exactly the same results as 
if we had taken a 7-average of the original system ( p.6|) ( p78|) and then performed the contraction 
procedure: the processes of ^-averaging and coarse- graining are commutative. Both end up yielding 

( Q A \ / °c 00 00 ^ \ ^ 

x r = 2L r -i — 2(4 )L r , L r = J x r x l — — — — x r +i J I e r + a r +ix r +i + ^ Xr,sX s + ^ ^ Cr,s,kX s x r +k ' 

^A r+ i / \ <j = l s= i k= i 



(4.4) 

Below we shall flesh out in more detail two of the simplest approximate schemes which this method 
produces. To aid the clarity of the resulting models we shall introduce some new notation and drop the 
tildes from the rate constants, on the understanding that they have been rescaled in the contraction 
procedure. 



4.2 Maximal contraction of the rate processes 

In this section we take the contraction procedure described in the previous section as far as possible. 
We take a value of A large enough so that there are no chains of length 2A. Thus we consider only two 
types of sequence: the monomer forms X\ and long chains with composition 7, x\. 

Consider a combination of crosscatalysis and auto-catalysis (that is template-based chain synthesis 
with errors), enzymatic replication, as well as the normal reaction mechanism; then there are four 
distinct ways to make a chain, described in Table 0. 

To simplify the notation, since we are only dealing with one length of chain (|7|=A = A + l),we 
shall number the types of chain, that is use Y n in place of Xl and y n in place of x^ for the corresponding 
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AXi Xl uncatalysed, with forward-rate coeff. = e 

AXi + XI # autocatalysis, with forward-rate coeff. = a 

AXi + X® # XJ + X2 crosscatalysis, with forward-rate coeff. = % 

KX\ + X2 + X® # 2X3 + X2 enzymatic-catalysis, with forward-rate coeff. = (. 

Table 1: The four mechanisms by which long chains are formed. 

concentrations. Then the equation for the rate of change of concentration y n is 

/ N N \ X 

y n = 2L = 2(xf- (5y^j e + ay n + x %> + Cl/n Vp) ' ( 4 - 5 ) 

where N = 4 A is the number of different chains (/3-D-ribonucleotide sequences) of length A. The 
parameter (3 determines the ratio of chains to monomers at equilibrium. The mechanisms listed in 
Table [I] are assumed to leave this ratio unchanged; the rates e,a,x,( only alter the timescale over 
which the system reaches equilibrium. The backward rates for the mechanisms listed in Table [l] are 
thus /3e, Pa, fix, PC respectively. 

The advantage of this model equation is that it can be solved in a number of different cases, and 
it is possible to find how an initially uneven distribution of ribonucleotide chains evolves in time - 
for instance whether certain chains die out, or if error-prone template synthesis (crosscatalysis) will 
even out the differences. The disadvantage is that the model does not contain any information about 
whether chains will tend to grow longer, or how expected chain length varies with time. 



4.3 Analysis of maximally contracted model 

There are two cases where we shall perform some simple analysis to show that the reduced model 
derived above generates meaningful results. We shall consider a uniformly growing solution - where 
all types of chain are present at the same concentration levels - and show that the concentrations 
increase. We shall then show that this solution is unstable to perturbations which favour the growth of 
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one species over another. The two cases we consider are firstly constant monomer concentration, and 
then constant density when the backwards rates can be neglected {(3 = 0). 



4.3.1 Prebiotic soup with constant concentration of ribonucleotide monomers 

We first consider a prebiotic soup which is provided with a constant supply of /9-D-ribonucleotide 
monomers. For simplicity we shall specify the monomer concentration x\ in such a way that 2xf = 1. 
If we then seek a solution of the form y n (t) = Y(t), independent of n, we obtain 



In order to solve this with the initial conditions Y(0) = 0, we make use of asymptotics with a, £, \ ~ 
(9(1) ^> e. The reaction starts on a timescale of ti = e A " 1 t, where Y = eY\ satisfies Y[ = (l + aYi + 
NxY\) x , hence 



This solution is valid until Y reaches 0(1), when the enzymatic reaction mechanism becomes significant 
and causes a slightly more rapid blow up. 

In practise however, we do not expect to find all species of RNA chain present in equal quantities. 
Hence in the real world this uniform solution must be unstable. We shall now show that it is also 
unstable in our model, and hence that our model does indeed predict the preferential growth of one 
chain type over the others. 

To analyse the stability of the uniform solution, we introduce a perturbation around it. Substituting 
y n = Y{1 + y n ) into ( [4.5|) and linearising we find 



Yy n = \Y(ay n + N(Yy n + X a + (Ya)(e + aY + N X Y + (NY 2 ) X - 1 -y n (e + «y + N X Y + (NY 2 )\ (4.8) 



where a = I] n =i Vn- Summing the above equation over the index n, it is possible to show that if 
S^o) = then a(t) = for all t. This enables the above equation to be simplified to 



Y = ( e + aY + X NY + (NY 2 ) X . 



(4.6) 
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where B = e + aY + xNY + (NY 2 . Thus the uniform solution is unstable whenever the difference in the 
square brackets is positive. For small times the uniform solution is stable, but at larger times it becomes 
unstable. If a < Nx/ (A— 1) then the instability does not set in until t ~ t c , but if a > iVx/(A— 1) then 
the instability occurs for times after 



^instab t c 



a + N X \ 
y\ J 



A-r 



(4.10) 

«A /J J 

that is, once the uniform solution has reached the size Y = F insta b = e/[a(X— 1) — Nx))- So we see 
that the instability occurs when the ratio of autocatalysis to crosscatalysis (equivalently, the ratio of 
error-free to error-prone template synthesis) exceeds a certain threshold. Put another way, for certain 
species (sequences) to grow whilst others become extinct, the template-based copying of RNA needs to 
achieve a certain specified level of accuracy. 

4.3.2 Prebiotic soup with constant total mass of ribonucleotide bases 

In the case of a prebiotic soup with a constant total quantity of ribonucleotide bases (another way of 
putting this is to say that the total nucleotide mass density, which may be present in monomers or 
chains, is constant), we eliminate the monomer concentration to leave 

/ N \ A / N N \ X 

y n = 2 (g-A^Vn) (e + ^y n + x Vv + CVn %» ) ■ ( 4 - n ) 

V n=l / \ p=l p=l J 

To find the uniform solution and determine its stability, we once again use the substitution y n = 
Y(l + y n ). Inserted into the above, this yields 

Y = 2(g - ANY) A (e + aY + X NY + (NY 2 ) X (4.12) 



i} n = -^L( g - ANY^ie+aY+xNY + CNY 2 )^ 1 [\(aY + (Y 2 N) - (e+aY +N X Y + N(Y 2 



(4.13) 



where we have already made use of the fact that a = Y2n=i Vn = for all time. The leading order 
solution to the former fl4.12|) is similar to the uniform solution in the constant monomer case above: 

Y= (^)( (i- f/f ! ) v.A- 1 » - 1 )- where ^-^H^WTWy < 414) 
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The uniform solution is then unstable for t > tinstab, where £i ns tab is given by ( |4.10|) but with t c 
determined by ( |4.14| ) rather than ( [4.71 ). 



4.3.3 Remarks 

It is possible to prove similar results for the case in which the backward rates are not neglected (/3 ^ 0), 
since all the analysis presented in the above section relies on a solution for which Y is small (Y ~ 0(e)). 
The solution remains valid until Y ~ 0(1). Since all the instabilities found above arise whilst Y is still 
(9(e), they will also arise in systems with (3 ^ 0, because in such systems the uniform solution and its 
stability criteria will not be altered by the (3 term until Y reaches 0(1). 

A more general notion of stability that we have not addressed is that for which the rate coefficients 
differ for different ribonucleotide chain sequences. By this we mean, for example, that different chains 
y n will have different autocatalytic rate constants, a n . A more detailed analysis could be carried out, 
and similar results to those presented above would be produced. The main difference is in the selection 
of which chains proliferate. In the case studied here, this is determined by initial conditions, but for 
cases in which the autocatalytic rate varies with chain composition, this provides another selection 
mechanism and those chains with the largest autocatalytic rates will grow at the expense of the other 
species. Thus perturbations to the autocatalytic rates accentuate the instability analysed above. 



4.4 Almost maximal contraction 

In the foregoing work, we have used a maximally-contracted Becker-Doring model to analyse the 
kinetics of formation of RNA chains. In order to derive a slightly more accurate model we can use the 
same theory as described in Section O] but instead of using a coarse-graining mesh so large that only 



monomers and chains are captured, we use a finer-grained mesh which includes two types of chains: 
short ones (of length A), and long ones (of length A + /i). Formally, the kinetic equations are then 
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This system has mass density given by 



£ = 4xi+ E A 4 7l + E (A + m)4 7] - (4.16) 

T-M=A 7:|7l=A+M 

Such a complex system is more easily understood if we generalise the notation of the previous 
section. In the special cases where fi = A + 1, which we shall study in more detail later, we simplify 
the notation for short chains x$ to y n , where n encapsulates the information stored in 7. Then we can 
write 

(N N,N N N,N 

e + ay n + J2xy P + E XV P ,g + E CVnVp + E Cy n y P , q + (4.17) 
p=l Pi</=1 p=l Pi9=l 

JV,JV N,N,N N,N N,N,N x A 

~l~ ^ ] Cym,nUp "i" ^ ' CUm^nUpq, "I" ^ ] ^Vn^mVp ~t~ ^ , CVn^Vp.q 
m,p=l m,p,g=l m,p=l m,p,g=l 

jv ( N N ' N N N ' N ' A 

- E ~ /%n,r) e + at/ njr + X E %» + X E + CZ/n,r E Vp + CZ/n,r E %>><Z 
r=l V p=l Pi9=l p=l Pi9=l 

AT / N N,N N N,N \ A 

- E (^1 _ A»J/r,n) e + ay r>n + X^V P + X E %>,9 + CVr,n E % + CZ/r,n E 
r=l y p=l Pi9=l p=l Pi9=l 

AT N.N N N.N 



(4.18) 



>,1 

p=l p,q=l p=l p,q=l 



This system of ordinary differential equations can be most readily understood in terms of the set 
of macroscopic chemical reactions comprising it. Let us use X\ to denote the monomer species, Y n to 
denote a short chain of type n, and Y m>n to denote a long chain whose composition is identical to that 
of a short chain of type m joined to a short chain of type n. The reactions which the above kinetic 
scheme describes are listed in Table 0. It will be recalled that, owing to the coarse-graining scheme 
applied, the quantities Y n and Y m) n represent "averages" over sets of chain lengths and sequences. 
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The constant quoted after each step in the reaction mechanism is the rate coefficient associated 
with that process. We shall not analyse this still very complex model here, however, as its complexity 
would be compounded by the subsequent introduction of hydrolytic fragmentation processes. Instead, 
the above model will be simplified further, and used in the following sections which include hydrolysis 
for the first time. 
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Y n + AX 1 



Y n 
Y„ 



e uncatalysed formation of short chains 
e uncatalysed formation of long chains 



Y n + AX, 



2K 



a autocatalysis of short chains 
a autocatalysis of long chains 



Y m + AX 
Y m , n + Y k + AXi 

Y m ,n + AXi 

Y m + Y k + AXi 



Y +Y 

1 m ~ 1 n 

Y m ,n ~)~ Yfc i 
Y m ,n ~)~ Yfc 
Y m ~{~ Yk l 



X crosscatalysis of short by short 

X crosscatalysis of long by long 

X crosscatalysis of short by long 

X crosscatalysis of long by short 



Y p + Y n + AX 1 
Y P) q + Y n + AX\ 
Y p + Y min + AX i 
Yp t q ~\~ Y m n -\- AX\ 
Y p + Y n ^ m + AX i 
Yp,q + Y n>m + AX i 



2Y n -\- Yp 
1Y + Y 

* 1 n ~ 1 p,q 

Y + Y +Y 

1 n ~ 1 m,n ~ 1 p 

Y + Y + Y 

1 n ~ 1 m,n ~ 1 p,q 

Y + Y + Y 
1 n ~ 1 n.m i 1 p 

Y + Y + Y 

1 n ~ 1 n.m T 1 p,q 



( ribozymic synthesis of short chains 

( ribozymic synthesis of short chains 

( ribozymic synthesis of short chains 

£ ribozymic synthesis of short chains 

( ribozymic synthesis of short chains 

£ ribozymic synthesis of short chains 



Y p + Y m>n + Y m + AX\ 
Yp^q ~\~ Y m n -\- Y m -\- AX\ 

Y p + Y min + Y n + AXi 
Yp,q ~\~ Y m n -\- Y n -\- AX\ 



2Y +Y 

" 1 m,n T Jp 

2Y +Y 

A 1 m,n ~ 1 p,q 

2Y + Y 
^ 1 m.n ~ 1 p 



2Y 



m,n 



Y 



P-M 



( ribozymic synthesis of long chains 

( ribozymic synthesis of long chains 

£ ribozymic synthesis of long chains 

( ribozymic synthesis of long chains 



Table 2: Reactions described in the kinetic equations ( [4.171 )— ( |4.18|) , together with their 
forward rate constants and a brief description of each. 
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To conclude this section, we have shown that a systematic coarse-grained contraction procedure 
can be employed to greatly reduce the number of rate equations and different intermediates that need 
to be considered. Such a procedure yields equations which are amenable to theoretical as well as 
numerical analysis and we shall show in Section |5] how the most drastic approximation-the maximally 
contracted system-can still account for the way in which a combination of template-based synthesis 
and ribozymic replicase activity amplifies initial differences in RNA chain concentrations. In Section ||, 
we shall return to the 'almost maximally contracted system' when incorporating hydrolysis into the 
theoretical description of RNA chain replication. 

5 RNA formation including hydrolysis 

From a chemical point of view, a crucial omission from all of the foregoing models is hydrolysis. This is 
the process by which polymeric ribonucleotide chains are prevented from growing to unlimited lengths. 
Many of the models analysed above have the unfortunate property of producing infinitely long chains 
in a finite time. This implies the presence of mathematical singularities within the kinetic equations 
and by itself indicates that certain essential physicochemical processes have been overlooked. 

So far, our general models of RNA formation have been based on the Becker-Doring theory, whereby 
chains grow or fragment through one step addition or removal of /9-D-ribonucleotide monomers. For 
more general coagulation-fragmentation processes, in which arbitrarily large amounts of polymer may 
be added or removed, one can use a description based on the general Smoluchowski equations (]5.1|) 
instead of the Becker-Doring equations. While the Becker-Doring theory evidently works well for the 
stepwise growth of RNA chains, hydrolysis can only be accounted for by a general fragmentation 
term, since the loss of a single monomer from the end of an RNA oligomer is an exceedingly poor 
approximation to hydrolysis. Therefore, in the models described below we shall retain the original 
Becker-Doring assumptions when modelling the aggregation mechanism, but we shall include a general 
Smoluchowski fragmentation term to handle the effects of hydrolysis. 

We wish to emphasize that we do not make any particular assumptions about the nature of hydrol- 
ysis. We certainly do not seek to describe it in terms of equilibrium proceses (which would enable an 
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arbitrary hydrolytic chain fragmentation to be equivalently described in terms of a sequence of step- 
wise processes, and to which Le Chatelier's principle could be applied). Rather, hydrolysis is modelled 
explicitly by a distinct kinetic process which splits a chain in two at an arbitrary point and is thus of 
general Smoluchowski fragmentation form. 

In the rest of this section, our model of possibly catalytic Becker-Doring chain growth combined 
with a general fragmentation term will be analysed by the use of generating function and coarse-grained 
reduction techniques. 

5.1 Formulation of the Smoluchowski coagulation-fragmentation equa- 
tions for RNA formation with hydrolysis 

The full Smoluchowski coagulation-fragmentation equations for the rate of change of concentration of 
a polymeric chain of length r are 

r— 1 oo 

Cr = \ J2 W k,r-k ~ W r,k, r = 2, 3, . . . 

k=l k=l (5.1) 

where the rate coefficients for the aggregation and fragmentation steps a riS ,b riS respectively now carry 
two suffices, indicative of the fact that clusters of sizes r and s can reversibly coalesce to form a cluster 
of size (r + s). We shall analyse two versions of this system: in one, the monomer concentration c\ will 
be assumed constant; in the other, the monomer concentration will be allowed to vary, and the density 
q = Y^T=i rc r w iU De ne ld constant. 

To recover the original Becker-Doring equations from ( |5.1|) , we impose the condition that there can 
be no polymer-polymer aggregation, so that a ribonucleotide chain can only aggregate via one monomer 
at a time. Thus we impose the following form 

on the forward coefficients a ryS (note that this implies an = 2ai); where, 5ij is the Kronecker delta 
satisfying Sij = 1 only if % = j and is zero otherwise. 
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Just as template-based catalysis was included in the Becker-Doring model of Section |3], it can be 
incorporated within this more general coagulation-fragmentation model. This is achieved by replacing 
a r>s above by a r>s + J2kLi fr,s,k c k, where f r>s ^ is another set of rate coefficients for the template synthesis 
of oligomeric /?-D-ribonucleotides. 

We shall make extensive use of generating function techniques, and choose simple forms for the 
coefficients a rjS ,6 riS so that these techniques may be used to their full potential. Initially, in the same 
manner as in equation ( |3.19|) , we define the generating function 

oo 

C{z,t)=J2c r {t)e- zr , (5.3) 

r=l 

which can be viewed as a discrete Laplace transform. This transforms the problem from having one 
discrete and one continuous independent variable (r, t) to two continuous variables (z,t). Note that 
putting z = in the above expression gives the quantity C (t) = C(0, t) which is the total number of 
chains in the system. A second important quantity in the analysis of these problems is 

u(z,t) = -^ = J2rc r e- zr } (5.4) 

OZ r =l 

and inserting z = in this yields the density, g = u (t) = u(0, t). 

In the main, we shall choose constant rate coefficients, although forward rate coefficients of the form 
a r = a + ar can be analysed without much extra difficulty. Two types of backward rate coefficients can 
also be analysed: (i) b TyS = b, which we shall concentrate on, and (ii) b r ^ s = b/(r + s — 1) which we do 
not consider further here. The choice of a constant hydrolysis rate implied by (i) is a reasonable first 
approximation; the choice (ii), where long chains have a smaller chance of undergoing hydrolysis than 
short ones, is less tenable. 

As was done earlier in the paper, two types of prebiotic scenario are analysed below. The first 
has constant mass density p, for which a simple closed system of equations can be generated for the 
monomer concentration (ci(i)) and the total concentration of polymer chains including monomers 
(C (t)). These give some indication of the polydispersity of the chains present in the soup since the 
average RNA chain length is then equal to 
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Typically we are interested in the evolution of the system starting from an initial condition where all 
the mass is in monomeric form; that is Ci(0) = g, c r (0) = Vr > 2 which implies Cq(0) = g. Systems 
of this type are analysed in Sections ^]2] and |573| . 

In Section |5.4| , a second prebiotic scenario is analysed, which has a fixed monomer concentration 
(ci), so that the total ribonucleotide density uo(t) may vary. This is equivalent to the pool chemical 
approximation. In this case we can form a simple closed system of equations for the density uo(t) and 
the total number of polymer chains Co(t). 

In all situations we shall initially consider, the only type of template-based synthesis we analyse 
involves error-prone replication or crosscatalysis (in fact, pure autocatalysis cannot be analysed using 
generating function methods as the type of nonlinearity which it introduces is not amenable to transform 
techniques). For this reason we have not given the full formulation of the problem, including details of 
individual sequences, as that would introduce autocatalysis both directly and as part of the ribozymic 
replicase polymerization mechanism. However, these more complex mechanisms will be introduced in 
the next section (Section |) when we consider massively contracted models. 

5.2 Constant monomer concentrations; pure template-based RNA syn- 
thesis 

In this section we shall reconsider the case analysed in the Appendix which has forward rate constants 
proportional to chain length; there, the total mass in the system and the average chain length both 
blew up in finite time. We now show that the incorporation of hydrolysis into the model prevents such 
unrealistic behaviour. 

Whereas an RNA chain can grow only by adding a single monomer to one or other end of the chain, 
a chain can split into two parts at any position along its length. Since the general fragmentation term 
we have included gives each phosphate bond an equal chance of breaking, and the number of bonds in 
a chain is proportional to chain length, the reverse reaction rate coefficient is effectively proportional 
to chain length. Thus we expect that a general hydrolytic fragmentation mechanism with constant 
rate coefficients will be able to prevent unbounded chain growth in a polymerization mechanism with 
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forward rate constants proportional to chain length. 

The equations which describe pure template synthesis together with hydrolysis in a system with 
constant ribonucleotide monomer concentration are 

oo oo oo 

c r = f( r ~ l)sc s c r _ici - frsc s c r C! - \b{r - l)c r + ^ bc r+s . (5.6) 

s=l s=l s=l 

The generating function C(z,t) of equation ( |5.3|) is then determined by 

^ = fc 2 lUo e- z - fc lUo u(l - e- z ) - \bu + \bC + b Cl e~ z + b \ °"[^~ s ) , (5-7) 

where u = — in turn satisfies 

du du du 

fc\u e~ z + fciu —(l - e~ z ) + fdu ue~ z + \b— + \bu + bc x e~ z 



dt dz ' 2 dz 

'u(l - e - z ) + ( C - C )e 

:i-e 



6C7 e- - 6 ( U ^" C ^- r r~ M);C • (5i 



A closed system can then be formed for the total number of chains Co{t) and total mass of RNA chains 
uo(t) by letting z — > in (|5.7|) and ( |5.8|) , giving 

C = fc\u + \bu Q - ^bCo + 6ci (5.9) 
Uq = fc 1 ul + fcju -bCo + bc 1 . (5.10) 

In general, explicit solutions of this pair of equations are not available, but special cases can be analysed, 
and a fuller picture built up from these. 
The system has equilibrium solutions at 



^^[M- 1± iJ^-M +1 )' (5 - n) 

where this is well-defined, namely for bj fc 2 > 7 + v48 ~ 13.9 (for bj fc\ < 7 — ~ 0.07, both roots 
correspond to negative values of the density u ). 

5.2.1 The case 6 = 

For 6 = the system of two ordinary differential equations is solvable. Making the substitution 
uq = 1/w produces a linear equation which can be solved, and leads to 

uo(t) = - % - ; (5.12) 
2e 1 i — 1 
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thus the system blows up at time t = t g = log(2)//cf which, as is to be expected, is the result obtained 
in the Appendix, since 6 = corresponds to no hydrolysis. For small b there will be only minor 
modifications to this behaviour. 

5.2.2 The case b = b c = (7 + V3)/c? 

At the smallest value of b where an equilibrium solution exists, the equilibrium values of Cq and Uq are 
|(l + 2y / 3)ci and (l + 2/\/3)ci respectively. An expansion about the equilibrium point using 

C = fc 1 (l + 2V3)(l + <7), M = ci(l + |v / 3)(l + ^) (5.13) 

reveals the existence of one stable and one centre manifold W% - The existence of a centre manifold is 



75 IF = bfr^]H^ S+ (V3-> + M, <«■") 



to be expected since at b = b c the system undergoes a saddle-node bifurcation: 

-J (-(v / 3 + i)g + (v / 3-i)u.+ 

The eigenvector corresponding to the stable manifold is (^^g^)^ 3 ) and that corresponding to the centre 
manifold is (^3^/2)' This situation is semi-stable in that the equilibrium position is only stable from 
one side. If too much matter is inserted into the system, then the system's behaviour is divergent, 
gaining mass without bound. However, if the system is initiated with only a little matter present, 
it will evolve smoothly and slowly approach an equilibrium solution. The situation is summarised in 
Figure 1. 

Systems which start below the stable manifold - the lines marked W$ on Figure [TJ - are attracted 
to the equilibrium solution along the centre manifold (Wcr); however, systems initiated above W$ gain 
mass and are repelled from the equilibrium point. Thus the trajectory we are interested in (that starting 
from the initial conditions C(0) = C\ — ito(0)) does approach the equilibrium solution. 

5.2.3 The case b > 1 

For asymptotically large b, the system has a critical point near 

C ~ Cl (l + ^iV Mo ~ Cl (l + lMy (5.16) 
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Uq 




ci Co C 

Figure 1: Diagram of the (C ,u ) phase plane for the case b = b c . IC marks the initial 
condition imposed on the system where all nucleotide matter is in monomeric form and 
so both the mass u — ci, and number of objects in the system C — c±. Wc is the centre 
manifold emanating from the equilibrium configuration of the system, along which the 
system slowly moves. W s marks the stable manifold of the equilibrium point; its existence 
shows that the centre manifold is itself stable in that points close to the stable manifold 
are attracted to it. At equilibrium the number of chains C = Co and the mass u = u . 

The nature of this critical point is a stable node, so the initial conditions at Cq = Uq = C\ are attracted 
to this stable equilibrium configuration. (There is also a saddle point at u ~ &/3/ci, C ~ fe/9/ci.) 
This is intuitively in line with what we expect-6 very large implies that hydrolysis dominates the 
kinetics, making it almost impossible to build long chains; the few that are built rapidly break up, so 
that the system remains mainly in monomeric form. 

5.2.4 Summary 

This example shows that the inclusion of hydrolysis, according to which RNA chains split at any point 
along their length, provides a powerful mechanism which can prevent chains from growing infinitely 
long. With a constant hydrolysis rate (independent of chain length) the mechanism prevents infinitely 
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long chains forming in RNA systems in which the growth rate is proportional to chain length. 

In Section |3.2|, a model with coefficients independent of chain length was analysed and shown to 



have divergent behaviour in the limit t — > oo. In this limit, both the number of chains and mass 
of material in chains grow without bound in such a way that the expected chain length also grows 
unboundedly. In the Appendix it is shown that in models where rate constants are proportional to 
chain length, such a singularity can occur in finite time. 

This section has shown that if a significant degree of hydrolysis is included into such models, it 
prevents this unphysical behaviour. If the hydrolysis rate exceeds a threshold value, there is a stable 
equilibrium configuration to which the system will tend. 

5.3 Constant rate coefficients and constant nucleotide monomer concen- 
trations 

We now consider a more general model which includes both catalysed and uncatalysed RNA chain 
growth mechanisms. We assume that all reaction rates are independent of chain length, using / to 
denote the catalysed rate and a the uncatalysed rate of addition of nucleotide monomers to a chain. 
For r > 1, the equation 

oo oo oo 

C r = aCiC r _i - aCiC r + fc X C r - X X! C * ~ f C ^ C r ^2 C s~ \K r ~l)c r + b^2 °r+s, (5- 17) 

s=l s=l s=l 

governs the rate of change of concentrations. The generating function then satisfies 



= ac\e~ z + fclC e~ z - a Cl C{\ - e~ z ) - /ciC C7(l - e~ z ) + 

i,^ i, i . b(C e- 2z -C) 

+±bC - \bu + b Cl e~ z + 

1 1 (1 — e~ z ) 

where u = — ^ satisfies the partial differential equation 

On 

= Cie -z( Cl + C)(a + fC )-u Cl (l-e- z )(a + fC ) + 

at 



(5.18) 



i,du , „ ,^ „ b\(l - e~ z )u + e~ z (C - C )} 
+> + ¥q~ z + h ^ Z - b Coe- z - ^ ( ( _ e . z y ° A - (5.19) 

Again, these equations cannot be solved in general, but useful information about their solutions can be 
obtained by considering the limit z — > in which they reduce to a coupled pair of ordinary differential 
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equations: 

C = {fc\ - §&) C + ac\ + bd + \bu Q 

Uq = fciCl + (fc\ + aci-b)Co + ac\ + bci. (5.20) 

We seek an equilibrium solution of these equations by first solving the latter quadratic equation to 
find Cq and then the former linear expression which gives uo in terms of Co- Let us define 

B= * , X= / j, 2 , (5.21) 
aci + /cf aci + jcf 

so that 5 is the reciprocal of the equilibrium constant for the overall reaction (the ratio of the total 
hydrolytic fragmentation rate to the total polymerization rate), while x is a measure of the effectiveness 
of catalysis, since it is the ratio of the catalysis rate to the total aggregation rate (0 < \ < 1). Thus 
the equilibrium value of Co is given by 



°° = ^ ( B ~ 1 ± V^" 1 ) 2 - 4 *) • ( 5 - 22 ) 
Due to the form of the discriminant, if the roots for Co are real then they have the same sign. If B < 1 
then they are both negative or both complex and hence are not relevant for the large time asymptotics 
of the system ( |5.20| ). In this case the total number of chains and mass of material must both tend 



to infinity. If 1 < B < 1 + 2y/x> then the roots for Co are complex and hence again irrelevant when 
considering the large time asymptotics of the system ( |5.20p . However, if B > 1 + 2y/x then hydrolysis 
is strong enough to prevent such unphysicochemical divergences, and in the large-time limit the system 
may be attracted to such an equilibrium configuration. 

5.4 Constant rate coefficients and constant nucleotide density 

In order to consider the constant-density form of the equations, we need to construct a new equation 
for the monomer concentration which accounts for the loss of monomer as the ribonucleotide chains 
grow. With constant reaction rate coefficients, the kinetic equations are 

oo oo oo 

c r = acic r _i - acic r + jc x c r -\ ^ c s - fc\C r J^c s - \b{r - l)c r + b^c r+s (5.23) 

s=l s=l s=l 

oo oo / oo \ / oo \ oo 

ci = -aci - ac i^2 c s - fcl^2c s - fc! [^2c s ] [Y^ c s ] + b c 8+1 , (5-24) 



s=l s=l \s=l / \s=l 
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which can again be analysed using generating function techniques; from (|5.3|) 



= Cie - z (C - Co) (a + /Co) - Cl C(a + /C ) + fce^Co + \bC - \bu + ^-^--S 



In this case the equation for % is not so important for us since the imposition of constant density 
implies iio = 0. This leads to the following coupled pair of ordinary differential equations for the total 
number of chains and the monomer concentration 

C = -fc x Cl - (aci + i&)C + \bg (5.26) 
c 1 = -(a + fC )c 2 1 -(aC + fC 2 +b) Cl + bC . (5.27) 

Starting from initial conditions in which all the available ribonucleotide mass is in monomeric form, 
which implies Ci(0) = g, C o (0) = g, we expect the system to approach an equilibrium configuration 
with significantly less monomers, and fewer chains, but where the average chain length (r) := g/Co is 
much larger than unity. 

It is not possible to explicitly find equilibrium points of this system, but we can show that they 
exist and occur in the correct region of phase space. First note that Co = implies that at equilibrium 

Cl " 2C (a + /C,)- (5 - 28) 
Then the condition t\ = corresponds to finding roots of the function 

C(C ) := b(g - Co) 2 + 2C (g - C )(fC 2 Q + aC + b) - AC 3 (a + fC ). (5.29) 

From this it is possible to see that C(0) = b 2 g > and G(g) = —4bg 3 (a + fg) < so there is an 
equilibrium value of Co between zero and g; and from ( |5.28|) we can find a corresponding value of c\. 
In the limit b — > 0, it is possible to obtain the equilibrium solution asymptotically; in this case it is 

C Q = \g, c 1 = 3b/(3a + fg), (5.30) 

so the monomer concentration is small and the equilibrium distribution of chains has mean length 
(r) = g/C = 3. 
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The ribonucleotide polymer chain distribution can then be found by solving ( 5.25 ) with ^ = and 



u = — to give 



\e z {e 2z 



l) + fe 



C{ Zj A 

from which we can find the concentration distribution 

£2 r (r-l 



3„z+2e~ 



(5.31) 



(5.32) 



r ( r +l)!(r+3)' 

This corresponds to a pure aggregation Becker-Doring model where coagulation stops as c\ — > 0. 
Figure |2] shows this profile which demonstrates how effective even a small amount of hydrolysis can be 
in moderating the growth of long chains. 
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Figure 2: Graphs of (a) concentration and (b) log (concentration) against ribonucleotide 
chain length (r) for the case with constant rate coefficients and constant total nucleotide 
density in the limit 6 — > 0, i.e. for vanishingly small hydrolysis (so that c,\ = 0). 



6 Coarse-grained models of RN A formation including hydrol- 
ysis 

The most straightforward way to incorporate hydrolysis into the contracted RNA models is to modify 
the "almost maximally contracted" model from Section |4.4j , by adding extra terms to the right-hand 
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side of ( 4.17| )- fl4.18|) . By the term "almost maximally contracted" we mean a coarse-grained reduction 
which describes monomers and two types of chain: short and long. (Recall that we imply here sets 
of ribonucleotide sequences, not individual RNA polymer molecules.) Then we can add in one extra 
mechanism-namely hydrolysis-which replaces one long chain by two short ones. However, as noted 
previously, since hydrolysis violates the Becker-Doring assumptions that proscribe cluster-cluster in- 
teractions, we cannot a priori build hydrolysis into a full Becker-Doring model, and then perform the 
coarse-graining contraction. Instead we take an already contracted model and add hydrolysis a poste- 
riori. We return to the almost maximally contracted model described by the rate processes listed in 
Table 0, and denote short chains (of length A) by Y n , where the subscript n enumerates the variety 
of different possible compositions of chains (ribonucleotide sequences), and long chains (length 2A) 
by Ym,n- Here as before m represents the information carried in the first half of the chain and n the 
information in the second half. Thus the chain denoted Y m ,n is simply a concatenation of the chain Y m 
with Y n . 

We shall not analyse the full system of rate processes described in Table |2] as, using physicochemical 
insight, one can see immediately that some of the steps quoted there make a negligible contribution 
to the replication of RNA. The mechanisms which we aim to model are described in Table [3| The 
constant quoted after each reaction mechanism is the rate coefficient associated with that process. A 
number of possible mechanisms have been eliminated from our model; these are listed in Table |j. 

We assume that chains are subject to hydrolysis (which splits one long chain into two short ones), 
and that both types of chain (short and long) are grown by an uncatalysed mechanism as well as 
autocatalytically. We shall assume that chains can only act as crosscatalysts to chains of the same 
length or shorter; thus we neglect the putative mechanism by which long chains are formed with a 
short chain acting as a catalyst as the rate constant for this interaction is expected to be very small. 
This assumption is carried through to the ribozymic synthesis mechanisms where we assume that only 
longer chains can act as enzymatic catalysts. 

Another assumption made in the model is that the main effect of hydrolysis is to split long chains 
into short ones, and not short chains into monomers. The reaction mechanism Y n — ► AX± could be 
added to the scheme but would further complicate the equations. 
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Yn + AXi 

Yn + AX 1 

Y m ,n + Fn + AXi 

V 

J 777. 7). 



- 1 n 

V 

1 n,m 

2Y n 
2Y 

^ 1 mn,n 

Y +Y 
1 m ~ 1 n 



ft. 



uncatalysed polymerisation 

uncatalysed growth of long from short chains 

autocatalysis of short chains 

autocatalysis of long chains 

hydrolysis 



Y m + AX, 
Y m , n + Y k + AXi 

Ym'n + AXl 



F 4- F 

Y m ,n Fs,i 
Y m ,n Y^ 



Xs 

\l 



crosscatalysis of short by short chains 
crosscatalysis of long by long chains 
crosscatalysis of short by long chains 



Y P; q + Y n + AXi - 


-> ?F 4- Y 

^ ^ 1 n \ 1 p,q 


Css 


ribozymic synthesis 


of short chains 


Yp^q 4~ Y m n 4~ \X\ — 


^ Y + Y 4- F 

1 n ~ 1 m,n ~ 1 p,q 


CsL 


ribozymic synthesis 


of short chains 


Yp,q 4~ Fi,m 4~ AXi ~~ 


-> Y 4- F 4- F 
^ 2 n r - 1 n.m ~ 1 p,q 


CsL 


ribozymic synthesis 


of short chains 


Yp t q 4~ Y m ^ n 4~ Y m 4~ AXi ~~ 


^ ^ 1 m,n i - 1 p,q 


Cll 


ribozymic synthesis 


of long chains 


Yp^ q + Y mtTl + Y n + AXi - 


_> 9F 4- F 


Cll 


ribozymic synthesis 


of long chains 



Table 3: Reactions included in the almost maximally contracted model (|6.1| )— ( |6~T2D , together 
with their corresponding forward rate constants. Subscripts i S\ i L' 1 denote short and long 
chains respectively. 



The kinetic equations for the reactions listed in Table ^| are similar to Qi.17 )— ( 4.18Q ; although 
slightly simpler due to the elimination of some mechanisms, they have one extra term introduced by 
the inclusion of hydrolysis 

N N I N,N N,N x A 

m=l m=l y p,q=l p,q=l 

A 



N I N,N N,N 

^yn.mUp,! 



(6.1] 



m=l 



p,q=l 
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A 



Y m + Yk + AXi — > F m + Y^j x crosscatalysis of long by short chains 

Y p + Y n + AXl — > 2Y"„ + Y P ( ribozymic synthesis of short chains 

Yp + Y m ,n + AXi — > + im,n + C ribozymic synthesis of short chains 

+ ^n,m + AXi — * Y n + Y n>m + Y p ( ribozymic synthesis of short chains 

Y p + Y m , n + Y m + AXi —* 2Y m!n + Y P ( ribozymic synthesis of long chains 

Yp + Y mtn + Y n + AXi - * 2Y m n + Y p ( ribozymic synthesis of long chains 

Table 4: Reactions not included in the model equations ( |6.1| )— ( |6T2| ). 

(N N,N N,N N,N,N 

e s +a s y n +Y^ x s y m + x x y P , q + C ss yp,qyn+ Y Q-Ly P ,q(y m,n~^~yn,m 
m=l p,q=l p,q=l m,p,q=l 

(N,N N,N \ A 

+ 2 x L y P , q + Y C^Vm,ny P ,q ■ (6.2) 
p,q=l Pi9=l / 

As a first approximation all the reactions are taken to be irreversible. Another way of interpreting 
this is to say that the dominant chain-shortening mechanism is hydrolysis; to leading order we ignore 
the reversibility of the stepwise growth mechanisms (this corresponds to setting /5 2 = = /3 3 ). 

Initially we will analyse the model with constant ribonucleotide monomer concentrations, mainly 
because this further simplifies the system, but later we examine the constant density system. The 
large number and complicated nature of the nonlinear terms present in this system of equations mean 
that a general solution cannot be found from analytic techniques. A full solution is only available 
using numerical methods; however, even this is fraught with difficulty due to the combinatorially large 
number of equations in the system. Here we shall study special solutions analytically in order to 
determine certain important generic features of the kinetics allowed by these equations. As in the 



analysis presented in Section [4.3| , we shall look for uniform solutions in which all concentrations of 
short chains are equal, and all concentrations of long chains are equal. Once certain properties of such 
a solution have been established, we prove that these uniform solutions are unstable to perturbations 
which cause some types of chain to grow at the expense of others. 
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6.1 Constant ribonucleotide monomer concentration 

If the monomer concentration is taken as constant (pool chemical approximation) then the system of 
rate equations is slightly simplified. For simplicity we shall scale the concentrations so that x\ — 1. As 
noted above, these assumptions imply that 02 = = (3%. 

6.1.1 Uniform solution 

To find this simple uniform solution, for which the concentrations of all short chains are equal, and 
similarly for the long chains, we assume that all the short chains y n (t) have the same concentration (say 
Y(t)) and all the long chains another concentration, Z(t) = ym,n(t). With these assumptions the very 



large system of rate equations (|6.1|)-(|67^) reduces to a coupled pair of ordinary differential equations 



Y = 2r ] NZ+(e s + as Y+x s NY+ Xx N 2 Z^; ss N 2 YZ+2( SL N :i zf 

-2NY (e^Z+ X ,N 2 Z^N 2 Z 2 ) A (6.3) 
Z = -rjZ + 2Y (e^Z+ X ,N 2 Z+C^N 2 Z 2 ) A (6.4) 



It is not possible to find a steady-state solution of these equations, since setting Z = in equation ( |6.4| ) 
implies that the flux from long to short chains caused by hydrolysis exactly balances the growth of 
long from short. Such a balance in (|6.3| ) in turn implies that Y must be strictly positive, removing any 
possibility of a physically relevant equilibrium configuration existing. Indeed, as this system evolves, the 
concentration of long chains grows without bound, and the concentration of short chains decreases to 
zero. The catalytic properties of chain growth dominate the system, leaving hydrolysis as a negligible 
effect, to the extent that there is a finite time singularity. The leading order balance in the first 
equation (fT3|) is 2 A C s A L iV 3A Z 2A ~ 2Y(^N 2A Z 2A , implying that as Z -> oo, Y ~ l/y/Z. Then Q 
implies Z ~ Z 2X , and so Z ~ (1 — t/t c ) 1 ^ 2A_1 ^. The strong catalytic properties of long chains coupled 
with irreversible polymerization reactions cause the rapid growth in the concentration of long chains to 
occur. If we were to introduce reversible reactions (fa 7^ 7^ /3s) then the singularity would be removed. 
This modification would also cause a uniform steady-state to appear, in which all the concentrations 
of short chains were equal, and all the concentrations of long chains were equal. 



38 



A singularity of the form described above is also present in the full system of equations (|6.1|) -(j6.2|) 
with (3 2 — = (3%. This case is thus rather unphysical, so we shall turn to the situation where density 
is conserved - an assumption which is more physically relevant and will also eliminate the singularity 
encountered above. 



6.2 Constant total concentration of ribonucleotides in the soup 

The full kinetic equations for the case where all the catalytic growth reactions are irreversible {fii = 
(3$ = 0) and density (g) is kept constant are 

(N N,N \ A / N,N N,N \ A 

p=l p,q=l J \ P,<3=1 P,<?=1 / 

(6.5) 

N 

Vn = 53 V(ym,n + yn,m) (6.6) 
m=l 

N / N N,N \ A / N,N N,N 

~ Vn \ Q ~ A 53 %> - 2A £ VpA £ l + a L 2/m,n + £ X L 2/ M + £ C LL 2/p,g2/m 

m=l V p=l p,q=l I \ p,q=l p,q=l 

N / V iV,iV \ A / AT,AT AT.JV x A 

- S I - A 2 j/ p - 2A 53 2/ M 1 £ L + a L 2/n,m + 51 X L 2/p,9 + 51 C LL %>,?2/n,r, 
m=l V p=l Pi<?=l / \ P)9=i Pi9=l 

(V Af.Af \ A / AT N,N N,N N,N,N 

0-A532/p- 2A 53 VP*) [ £ s+ a S yn+J2x s ym+^2 X x y P ,q+J2 ( SS yp,qyn+ 53 C SL 2/p,<? (2/m,n+ y n ,r 
p=l p,q=l I y m=l P,9=l P><?=1 m jP><7=l 

where it will be recalled that A = 4 A , the number of sequences of length A. Of course, even this set of 
coupled equations cannot be solved exactly either analytically (due to its nonlinearity) , or numerically 
(since it is NP hard). However, we can make important progress by using theoretical methods to study 
certain special solutions and their local behaviour. 

6.2.1 Approach to the equilibrium solution 

The simplest solutions of all are the time-independent ones, and in this case there is a family of 
equilibrium solutions. These equilibrium solutions are degenerate, in that only short chains are present; 
there are no monomers or long chains. It is thus a genuine equilibrium state since there are no fluxes 
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present in the system. The reason for the time-independent solution being a true equilibrium solution 
rather than a non-equilibrium steady-state is that once all monomers have been converted into chains 
(whether short or long), no further growth can occur; the only remaining process is hydrolysis, which 
converts all the long chains into shorter oligomers. Hence as t —>■ oo, the system approaches 



Vr, 



0. 



x\ = 0, y m > such that 



N 
m=l 



Q/A, 



(6.7) 



which is an arbitrary distribution of short chains satisfying mass conservation. Thus the kinetics allowed 
for in the model permit highly non-uniform equilibrium solutions, including the extinction of species 
(that is all the longer chains, and any number of the shorter chains). 

Expanding the dynamical behaviour about the equilibrium solution, we can find the late-time 
asymptotics which tell us how such a self-reproducing soup of ribonucleotides approaches an equilibrium 
state. The quantities y m! n(t),Xi(t) both tend to zero, and we put y n (t) = y n — h n {t) with h n (t) — > as 
t — > oo. To leading order, the quantities h n ,xi,y m , n then satisfy 

N N.N 



Vm.n 



h r , 



a J>«- 2 E v™ 

\q=l p,g=i 
-r]ym,n + e A A A (y m + y ri 

N 

-V (ym,n+yn,m) ~ A A I ^ K - 2 ^ y p>l 
m=l \9=1 P>9 = 1 



N N,N 

K - £ vp,i 

.q=l p,q=l 



(6.8) 
(6.9) 



N 



N.N 



Xs0 

A 



2Nsty n 



(6.10) 



As in our earlier study of self-reproducing vesicles [22|, the phase space trajectories followed by 
the mathematical solutions of these kinetic equations are not strongly attracted to an equilibrium 
configuration; rather the dynamics close to equilibrium occurs on a centre manifold, where nonlinear 
terms dominate the behaviour. Motion on this manifold occurs slowly: relaxation does not follow an 
exponential decay with a characteristic timescale (relaxation time); rather the concentrations vary as 
inverse powers of time, with exponents —1/A for short chains and monomers, and —A/ A for long chains. 

To explicitly find the nature of the approach to equilibrium, we substitute 



Xi 



(t) X\t ^ , /l n (t) h n t ^ , ym,n(t) ym,nt ^ j 



(6.11) 
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into equations ( |6.8|) - (|6TT0|) . This yields 

E^A A H A (y m +y r , 



xx = AH, 



h, 



\A A H A 



2ge 



A 



+ [£ s +a s y n - 



A 



(6.12) 



where H = J2q=i h q . The latter expression enables H to be found: 



XA 



A 



q=l 



A 



(6.13) 



These results show that the short chains with largest equilibrium concentration y n also have the 
fastest growth rates near equilibrium, since the larger y n , the larger h n . Note, however, that this is 
only a local analysis: we have only found how the system behaves once it is close to any one of its 
degenerate equilibrium solutions; we have not yet discovered anything about which equilibrium solution 
is approached, or how it evolves from an initial condition towards equilibrium. To gain more insight 
about earlier times in the evolution of the system, we now analyse a special solution-the uniform 
solution. 



6.2.2 Uniform solution and its stability 

We now turn to analyse more general far from equilibrium kinetics; in particular, we shall be concerned 
with investigating whether the uniform growth of all chains is a stable configuration. The Ansatz 



Vn 



Z(t), y m {t) = Y(t), 



(6.14) 



when inserted into (|6 . 5|j (|5TBD , implies 



Z = - V Z + 2Y (q-ANY-2AN 2 z) A (e^Z+x,N 2 Z^N 2 Z 2 ) A 
Y = 2r]NZ-2NY (q-ANY-2AN 2 z) A (e^+a^Z+x^N 2 Z+(^N 2 Z 2 

+ (^AiVF-2AiV 2 z) A (e s +a s Y+ Xs NY+ Xx N 2 Z+C ss N 2 YZ+2(^N 3 Z 2 ) X . 



(6.15) 



(6.16) 



These equations are not explicitly solvable in terms of elementary functions due to the number and 
strength of the nonlinear terms present. The Ansatz has, however, removed the NP hard complexity of 
equations (|6.5|) - (|6.6|) so, if desired, a numerical solution of (|6.15|) -( |6.16|) could be found using standard 
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methods. We shall not pursue this here, as it is our aim to show that the solution is in fact unstable 
and so will not be realised. The presence of an instability means that any small perturbation of the 
solution will grow as time progresses, and the full system ( |6.5|) -( |OD will not approach the equilibrium 
solution of (|6.15|) - (|6.16|) . In our system of equations we shall assume that the instability is initiated 



by slightly nonuniform initial conditions; in a more general model the instability could alternatively be 
caused by different chain types having slightly different reaction rates. We shall also demonstrate that 
the instability is effective at earlier times for certain choices of parameters, indicating that the uniform 
solution will not be manifested in the solution of (|6.5|)-(|Er6"|). 

To illustrate this instability mathematically, we perform a linear expansion about the uniform 
solution, putting y m ^ n = Z + Zy m ^ y n = Y + Yy n , with y n ,y m ,n < 1, into (p3|)-(^6|). At leading 
order, this produces 

N 

v n = A (t)yn + Bit) (y ) (6-17) 

m=l 

y m ,n = C(t)y m ^ n + D{t)[y m + y n }. (6.18) 

where 

Ait) = -^^ + l\iX-l)ia s + N 2 C ss Z)Y-(e s +N Xs Y + N 2 x^Z + 2C s ,N 3 Z 2 ) 



Y Y 

x ^ANY-2AN 2 Z) A (e s +a s Y+ Xs NY+ Xx N 2 Z+(: ss N 2 YZ+2C^N 3 Z 2 ^ 1 



X 



B{t) = -^~AZia, + C^N 2 Z)[g-ANY-2AN 2 Z) [e^Z+x^N 2 Z^N 2 Z 2 ) + 

\C N 2 Z 2 A A-l 

+ J^sl_ (^Q—ANY—2AN 2 Z^j (e s +a s Y+ Xs NY+x^N 2 Z+C ss N 2 YZ+2C SIj N 3 Z 2 ) 

C(t) = ^ ((^ANY-2AN 2 Z) A (e.+a.Z+x.N 2 Z+(^N 2 zf [\Z(a h +C hh N 2 Z) - (e L + x L iV 2 z/ 
D(t) = ^(^ANY-2AN 2 Z) A (e,+a L Z+ X ,N 2 Z^N 2 Z 2 ) A . (6.19) 

Now it can be shown that if J2m=i V™. — = X^=i %,q at an Y instant in time, then these quantities 
are zero for all time. Since it is sufficient to consider perturbations of such a form, this result has been 
used to simplify the above expressions. 

Since equations (|6.17|) - (|6.18|) are linear in y n ,y m ,n, the temporal dimension of the problem can be 
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separated from the compositional part by expressing the solution in the form 



Vn(t) = £ n S(t), y m ,n(t) = £ m , n L{t). 



(6.20) 



This approach introduces a new parameter into the problem - the separation constant K, where 
£ m n = K(!; m + £ n ). Proving that the uniform solution is unstable then reduces to demonstrating that 
the functions L(t), S(t) grow as time (t) increases. These functions satisfy the differential equations 

/ 6 \ / A US OAF jy TD { 4-\ \ I , 

(6.21) 



V 



S \ I A(t) 2NKB(t) 
L ) \ D{t)/K C{t) j y 

From this it is possible to deduce that the uniform solution is unstable for certain choices of the 
parameters. A detailed analysis is hindered by the fact that the uniform solution for Y(t), Z(t) (|6.15 )- 
Q6.16 ) cannot be explicitly determined. 

The simplest part of the reaction kinetics to analyse is the approach to equilibrium, and this is 
where the most rigorous results can be obtained. We shall discuss this stage of the reaction first, and 
then proceed to derive more general and approximate results for earlier stages of the polymerisation 
scheme. 



6.2.3 Instability in the approach to equilibrium 



The only equilibrium solution of the system (|6.15| )- (|6.16j) is Z = 0, Y = g/NA which implies x\ = 0. 
The approach to this state is governed by 

Q 



xi(t) ~ Aivyir 1/A , Y(t) 



AN 



Fir 1/A , Z(t) ~ z 1 r A/x , as t -> oo, 



(6.22) 



where Y 1 , Z 1 > can be found either from an expansion of (|6.15|) -( |6.16|) about the equilibrium solution, 
or by inserting the Ansatz ( 6.14 ) into ( |6.12|) -( |6.13| ); they are thus determined by 



= \A A N A (2ge2 + A(6 



AN 



A 



2gA x N x e A Y 1 A 



V 



(6.23) 



An examination of (|6.17|) - (|6.19|) in the limit t — > oo highlights one condition for an instability to 
exist. In this limit, the kinetic equations simplify to 



/V 



t A ' x y n = A y n + 2(ANY l£ J A £ (y m ,„+y„, m ), 



y-m,n 2^1 (y™ y™ ym,n) 



(6.24) 



m=l 
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where 



A = (NAY! 



(A-l)a s 



X s 



NA 



A 



A-l 



4A^ y 



(6.25) 



Separating the temporal and compositional variables, as in ( |6.20| ), we obtain a simplified system of 
equations which can be written as 



t A/x ^ + (vt A/X -Ao)^- V[A + 2N(e,ANY 1 ) A ]L = 0. 



(6.26) 



Unfortunately it is not possible to solve this equation analytically, but the closely related system 

(6.27) 



rl 2 T HT 
* df2 + fa* - A ^ ~ ^ + 2iV(£ L AA%) A )L = 0, 



can be solved in terms of hypergeometric functions (see Abramowitz & Stegun |28] for details). This 
approximation is easily justified by noting that the range of chain lengths we are interested in is A = A+l 
to be approximately 50. The solution of ( |6.27| ) can be written as 

L{t) = K X M (-2AT(£ L AATF 1 ) A - A , -A , - v t) + K 2 U (-2Ar(£ L AATF 1 ) A - A , -A , - V t) , (6.28) 

where K\, K 2 are constants of integration and the functions M(-), U(-) are as defined in Chapter 13 of 
Abramowitz & Stegun [ 2~8| . This solution grows in the limit t -> oo if A + 2N(e Ij ANY 1 ) A > 0. This 
condition can be rewritten as 



(A-l)a g 



N 



e a A 



Q 



+ X S 



NA A 



(6.29) 



Thus the instability depends on the autocatalytic rate being greater than the sum of the uncatalysed 
rate and crosscatalysed (copying-error) rate by a factor which depends on the number of chains. 

Qualitatively, this agrees with our intuition: in order for perturbations to grow, the autocatalytic 
rate should be greater than the total rate at which error-prone copies are produced (whether that be by 
the uncatalysed or the catalysed mechanism). 



6.2.4 More general instability analysis of the uniform solution 



Formally, the Routh-Hourwitz criteria [^9j for ( |6.21D are that the system is stable if A + C < 0, and 
AC > 2NBD. Therefore, to prove that the system is unstable our aim is to prove at least one of 



A + C > 0, 



AC < 2NBD 



(6.30) 
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holds. However, since A,B,C,D are functions of time (t), results based on these inequalities rely on 
interpreting stability in the looser manner of 'as t increases the perturbation grows', rather than a 
strict argument in the style of 'as t — > oo, S(t), L(t) ~ fit), for some given f(t)\ More rigorous results 
in the limit t — > oo have already been given in Section |6.2.3| . 

One interesting point from a mathematical perspective is that the stability criteria do not depend 
on the separation parameter K. Thus the bifurcation where the transition from stable to unstable 
behaviour occurs is degenerate in that many modes of the system change stability at the same point 
in parameter space. 

Routh-Hourwitz analysis confirms the presence of an instability as the system ( |6.21|) approaches 
equilibrium, but since we are now working with a slightly different definition of stability and have not 
made the approximation by which ( |6.26| ) is replaced by ( |6.27| ), we obtain a slightly different inequality 

« s > (A _ 1)g ■ (6-31) 

This inequality can be interpreted in a number of different ways; for example, it sets upper limits on 
the uncatalysed growth rates for chains, e s . Another interpretation is that it prescribes an upper limit 
on the error-prone self-replication rates x s - The limits, however depend on the chain length (N), so 
a third explanation is that it sets a maximum on N, the length of chain which can successfully self- 
replicate and maintain its concentration at a significantly higher level than other chains of the same 
length. Finally, it prescribes a minimum for the autocatalytic rate constant above which the differences 
in concentration between chains grow. 

We progress now to apply the inequalities fl6.30| ) to special cases of the system fl6.17| )- Q5.19p , since 
it is then simpler to draw more specific conclusions. The special cases we consider are (i) systems 
with only crosscatalysis and autocatalysis, and (ii) systems where ribozymic synthesis dominates chain 
polymerisation. Of course in the RNA world both processes occur simultaneously and interact with 
each other; only an analysis of the full system ( 6.17 )— (|6.19|) will reveal how these mechanisms affect 
each other, but some useful preliminary insight and results are gained from analysing the processes 
independently. 
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6.2.5 Effect of template-based catalysis on the uniform solution 



To analyse the effect of standard template-based synthesis on the process, we set all the other rate 
constants to zero, £ L = e s = C ss = C SL = C LL = leaving only a L , a a , Xs , X x , X L , V 0- We then solve 
the model subject to initial conditions in which only a few chains are present (0 < Y(0),Z(0) <C 1). 
The condition A + C > can then be simplified to 

A-l 



a s Y + N Xs Y + N 2 x x Z) (A- l)a s Y -N Xs Y -N 2 X ^Z 



+ 



(6.32) 



+2Y 2 Z\a^ + N'x.nW- N 2 X J > 



This inequality is highly suggestive of the simple conditions 



2Nr]Z 



(g-ANY—2AN 2 Z) 



a • 



(A - l)a s > N Xs 



Aa L > N Xl 



(6.33) 



Since the right-hand side of (|6.32| ) is positive, one or both of the terms on the left-hand side must be 
too. Requiring the first term to be positive implies the first part of ( |6.33j ) and the second term in (|6.32|) 
yields the second. If both parts of equation ( |5.33j ) hold then for small enough rj an instability will exist. 
Such inequalities are fully in accord with earlier results ( (|6.29| ) and ( 6.31|) ) as well as our intuition 
that autocatalysis should dominate crosscatalysis in order for distinct species of chains to emerge and 
remain viable. 

The other condition (AC < 2NBD) can be rewritten as 

A-l 



A« L -iV 2 XL (\-l)a s Y-N Xs Y-N 2 Xx Z 



(a s Y + N Xs Y+N 2 Xx Z) ~ + 

NrjZ [(a+ak 



(6.34) 



+ Nka l YZ\a lj + N 2 xX < 



N 2 X, 



(^ANY-2AN 2 Z) A 

which shows that the presence of hydrolysis (the r] term) can cause an instability. Note that while the 
previous inequality gave an upper limit for 1], this inequality prescribes a lower bound. At this stage, 
it is perhaps worth repeating that only one of the two inequalities ( |6.32| ) and ( |6.34| ) need be satisfied 
for an instability in the system to exist. 

Simple inequalities such as (|6.31|) and (|6.33| ) are actually quite stringent; since is exponentially 
large, the autocatalytic reaction rate must far exceed that of the crosscatalytic rate. It is helpful to de- 
fine a quality factor Q for these template-based chain synthesis mechanisms. There are A^ crosscatalytic 
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processes in our model, each operating at the rate Xi om y one °f which produces an accurate copy of 
the catalyst, and there is one autocatalytic mechanism which is taken to provide an exact replication 
of the catalyst, at rate a. Thus from all the template-based catalytic chain synthesis processes the 
proportion of accurate copies is 

Q = a + X , (6.35) 

This quantity can be defined for the formation of short or long chains. The inequalities ( |6.31| ) and 
(|6.33|) can be recast into conditions on Q, in the manner of Nuno et al. |T(|, and in this form, they do 



not appear so demanding. For example, the second of the inequalities in (|6.33|) becomes 



By considering these template-based catalytic processes along with uncatalysed RNA polymeriza- 
tion (that is, ignoring ribozymic synthesis) it is possible to make a crude estimate of the timescales 
over which RNA chain formation occurs. Integrating a simplified but dimensional version of ( |6.3| ) 



Y = g A (e s +a s Y)\ (6.37) 

we find 

(6.38) 



(A-l^-V 

Taking a chain length of A = 10, a concentration (density) of g = 10~ 3 M and a = 10 5 e s , we obtain for 
the possible number of chain sequences N = 10 6 . The timescale ( |6.38| ) then gives a quantity which is 



consistent with RNA chains forming in months-years, if the value chosen for e s is taken to lie in the 
range 3.5 — 4.5 M~ 1 sec~ 1 / A . A much faster timescale is obtained if one calculates the rate of divergence 
from the uniform solution according to A(t) in equation ( |6 . 1 7[ ) . Thus the model shows that the selection 



of one species over another happens on a much faster timescale than the growth of chains. These 
estimates are crude due to their reliance on raising the reaction rate e s and density g to the power A; 
for example, with A = 10, an alteration of a parameter by a factor of two yields a factor of 2 10 « 10 3 
in the final approximation for the timescale. 
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6.2.6 Effect of ribozymic synthesis on the uniform solution 

It is harder to derive concise results from the case of ribozymic synthesis since these mechanisms intro- 
duce higher nonlinearities and coupling between different chains than the usual catalytic mechanisms. 

In this case we set all catalytic and normal rate constants to zero (e L = e s = a L = a s = x s = X x = 
X L = 0) and keep only the hydrolysis and ribozymic synthesis rates nonzero (r), ( ss , C SL , CmJj Table 
|3|. The condition A + C > (equation (|6.30|) ) then reduces to 

2X^N 2 Y 2 Z A+1 + (C SS Y + 2Q^NZ) X - 1 [(X-1)C S Y -2^NZ] > '- x , 

(6.39) 

which clearly fails when Z is small and when the monomer concentration is small, since in both these 
cases the right-hand side becomes large. However at intermediate times this inequality can be satisfied. 
For example, if C ss is sufficiently larger than £ SL , then the second term could become large enough to 
dominate the right-hand side. This condition shows that the highly specific nature of ribozyme-assisted 
synthesis of a short chain by an identical or complimentary short chain is of greater importance to the 
instability than the less selective synthesis of a short chain by a long chain. Alternatively, the first 
term indicates that a large value of £ LL (ribozyme-assisted synthesis of long chains by long chains) can 
instigate the instability. 

The other instability condition AC < 2NBD reduces to 

^ (C„Y + 2(^NZ) X - 1 l(\-l)(„Y - 2(^NZ] + (6.40) 



In the limits Z — > and x\ — > this inequality is satisfied, implying the system is unstable in these 
limits even though the previous inequality (|6.39|) is not satisfied. Due to the large size of N, we expect 
iV~" 2A to be insignificant, and so if this inequality were to hold, it would be mainly due to the presence 
of ribozyme-assisted synthesis of short chains by long - the C SL term. 
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7 Discussion 



The question of which type of RNA chain (one ribonucleotide sequence or a set of such sequences) will 
grow from an initial soup of monomers presents an extremely difficult challenge both experimentally 
and theoretically. From the latter perspective with which we have been concerned in this paper, 
it is a massively degenerate version of the problem of polymorphism, that is of determining which 
particular crystal structure among several will grow from a melt or supersaturated solution. This 
problem is known to depend on the kinetics of the process-it is not an equilibrium phenomena: the 
thermodynamically most stable macroscopic structure is not necessarily the one which nucleates first 
or grows the fastest, as is well known in the fields of biomineralisation |30| and the crystallisation of 



macromolecules [31, |32|]. In the present context, the difficulty is compounded many times over owing 
to the combinatorial complexity of polyribonucleotide self-reproduction, which causes the problem to 
acquire an NP hard character in the language of algorithmic complexity theory [[|. 

The models we have introduced and analysed in this paper include many catalytic mechanisms 
which affect the rate at which various reactions proceed; these include inter alia template-based and 
ribozymal replicase-assisted RNA synthesis. Our paper has addressed a basic question concerning the 
putative origins of the RNA world from the standpoint of plausible chemical kinetics and provides 
important evidence supporting the notion that self-reproducing ribonucleotide polymers could quite 
easily have emerged from a prebiotic soup. In reaching this conclusion, we have analysed carefully 
the nonlinearities with which this problem is replete. It is worth pointing out that much of the 
previous work on related issues, including that of Eigen et al. on hypercycles P], [T2|, [TT], |TD[ was based 
on the analysis of linearised kinetic equations, a procedure that drastically suppresses the difficulty 
of the problem together with the richness of possible behaviour. Nuno et a/.'s work, like that of 
Eigen et al., also assumes the existence of self- reproducing hypercycles, but does incorporate certain 
nonlinearities |Tj| l5| . Addressing a different problem, they model template-based synthesis using 
quadratic nonlinearities as in our models. In addition, our models, include ribozymically assisted 
synthesis described by cubic nonlinearities. However, in our approach the coarse-graining contraction 
procedure, used to overcome the combinatorial explosion of possible proliferating polyribonucleotide 
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sequences, introduces extremely high nonlinearities into the analysis. Overall, it is the key combination 
of hydrolysis and catalysis which we find produces stable populations of a small number of selected 
chain types. 

Cross-catalysis (error-prone template-based copying) speeds up self-reproduction generally, but it 
tends to even out the distribution of sequences since it acts non-specifically. Autocatalysis and hydrol- 
ysis are sequence specific. Hydrolysis splits one chain into two: whereas a single long chain can catalyse 
only one reaction, the two shorter chains produced by hydrolysis can each catalyse reactions which make 
chains similar to the original long chain. Thus hydrolysis aids the autocatalytic replication process by 
recycling and hence multiplying the number of chains of a given length capable of autocatalysis. This 
recycling of material and information can be thought of as primordial metabolism. If hydrolysis acts 
more slowly than autocatalysis there is not enough catalytic material around to maintain the instability 
driving the preferential formation of certain sequences and the suppression of others; then other factors 
will even out the differences in concentration of different species and no selectivity ensues. 

The results of Section |6| show that even with hydrolysis, the presence of catalytic chain growth 
mechanisms implies that viable concentrations of long chains will eventually be formed if one is prepared 



to wait long enough. At the end of Section |6.2.5| , a simple calculation was carried out to estimate the 
length of this induction time. Although estimates of numerical values of our parameters are extremely 
hard to find, the data available is consistent with a timescale of months for the formation of viable 
concentrations of chains of ten bases. 

In conclusion, we have demonstrated that it is possible to realise the selection of certain self- 
replicating RNA polymer chains in a reasonable amount of time starting from plausible assumptions 
about the chemistry and initial conditions that could have prevailed within a putative prebiotic soup 
comprised of /3-D-ribonucleotide monomers. 
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A Analysis of an alternative kinetic model for pure autocat- 
alytic RNA replication without hydrolysis 



In Section |3J| we considered a detailed kinetic model for RNA replication that omitted hydrolysis and 
assumed that rate coefficients for polymerization were independent of chain length. In this section we 
retain the same structure for the kinetic equations ( |3.14j ), but instead of assuming that the reaction 



rate coefficients are independent of chain length as in ( [3.17|) , here we make them proportional to chain 



length: a r>s = rs, a r ^ = er. The flux equations (|3.17| ) are thus replaced by 



= c7cf (er + £ rscl , J** = (er + £ rsc e s , (Al) 



and inserted into the kinetic equations ( 3.14Q (hydrolysis is again neglected). We continue ignore the 



exact order of nucleotides in the chains (7), concentrating purely on the length of chains, in order to 



investigate kinetics by which a distribution of long chains may be formed. We define f r = D T | 7 |=r c r 



.7 



in the same sense as before, to simplify the governing equations (|3.14j) and ([Al]) to 



fr = 2/i/r-i (e(r - 1) + J> - l)sA - 2fJ r (er + ^ rsfA 



(A2) 



As in the previous example, the system of ordinary differential equations can be solved using the 
generating function ( |3.19| ), which now satisfies the partial differential equation 

BF BF 

— - 2fr — (1 - e- z )(e - G (t)) = 2fle~\e - G (t)), (A3) 

where G (t) = f | 2=0 (thus G (0) = 

Equation (|A3|) is significantly harder to solve than the previous example; it is best analysed using 
the method of characteristics. First, however, we shall find the function Go(t). We define G(z,t) = |jj 
so that G (t) = G(0,t) then (|A3|) implies G(z,t) satisfies the equation 



BC BC 

— = 2A(1 - e -*)(6 - G )— - e-\e - G )G = -he~\e - G ). (A4) 

In the limit z — > this reduces to the ODE 

dGn ( ( ft 4- e )e 2tfl ^ fl -^ - 2e \ 

^ = 2/ 1 [ £ -G ( ( )][G (()-/ 1 ] G oW = / 1 ( ( y;+ £) ) e2 , /i , /i _ () _ 2/ J , (A5) 
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when integrated using the initial condition Gq(0) = —fx- This function gives us the first indication of 
singular behaviour, for it blows up when 



t = t r :-- 



log 



2/i 



log(2) 



(A6) 



2/i(/i- e )'"°l/i+e/ 2ff 
To analyse the situation in more detail, we return to the full equation (|A3|) and solve the charac 



teristic equations 
dt 



a/7 
dz 

da 
dF 

da 



-2/ 1 (l-e-)(e-G'o(t)) 
2fie~*(e-G (t)) 



t = a + hx{r) 

e z = l + exp(h 2 (r)-2f 1 [ea-^G (s)ds]) 
r _ /, , f2 / [e-G {s))ds 



hs(r) + VI 



l + exp[fc 2 (r)- 2fxi.es - J£ G (p)dp)] 

(A7) 



Now we apply the initial conditions F(z, 0) = j\e z on the line a = 

t = ^(r) = 

2 = r h 2 {r) = log(e r - V 

F = he~* => h 3 {r) = f x e-\ 

It is possible to eliminate both a and r to obtain the solution 



(A8) 



F(z,t) 



h 



+ 



2f 1 2 [e-G {t-u)]du 



1 + (e 2 - 1) exp(2/ 1 [et - /„' G (u)dt;)]) ^0 1 + (e* - 1) exp(2/ 1 [ew - / U G„(t - p)dp]) 



(A9) 



however it is not obvious from this how the individual concentrations f r (t) behave. We are in a similar 
position to that found in Section |3.2| , and so we again treat the chain length distribution function 
f r (t)/F(0,t) as a time-dependent probability distribution function, extracting the number of chains, 
expected length and standard deviations from the function F(z,t). 
The number density of chains, 

(A - c) ^ 



N(t) = F(0,t) = A 



1 + 1oe 



(AlO) 



2f 1 -(f 1 + e)e*Mf^) J 

blows up at t = t c - the same place as Go(t) became singular (|A6|). To perform a local analysis we put 
t = t c — r; then N ~ — fx logr, as r — ► 0. The expected chain length is 

1 



E(t) 



1 dF {0 £) _ {(/i + e)e 2 ^^-)-2e} 



F(0,t) dz 



{2/ 1 -(/ 1 + 6)e 2 ^(/ 1 -)}[i + l og 



2/i-(/i+6)e 2t /i(/i- E > 



(All) 
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which also blows up at t = t c . Finally, the variance 



V(f) = (r 2 > - <rf 



1 d 2 F 



(0,t) 



G (tf 



F(0,t) dz 2 ^'"' F(0,t) 2 ' 
for general e is a very complicated function, but in the limit e — > 0, it simplifies to 

V(f) 



(A12) 



(2e 2 *A 2 - l) 




1 - log (2 - e 2 *A 2 ) " 


- e^l 


(2 - e 2t fi 


f 


r l - log (2-e 2 */?)" 


2 



(A13) 



Plotting E and (E ± v V) against time for < t < t c , we find that the expected chain length (E) 
remains small for a long period of time, before increasing rapidly as t — > t c , where it diverges. E + v^V 
behaves similarly although it always lies above E; however E — \/V diverges to —00 as t — > t c , implying 
that shorter chains are suppressed by the kinetics. These results suggest that uniform growth is only 
mildly unstable during periods of slow increase in concentrations of long RNA chains, but highly 
unstable when rapid change occurs. 
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TABLE 1 



AX i # XI uncatalysed, with forward-rate coeff. = e 

AXl + X] # 2X] autocatalysis, with forward-rate coeff. = a 

AX i + X% # A2 + A| crosscatalysis, with forward-rate coeff. = x 

AX 1 + Aj + Ag # 2A2 + A2 enzymatic-catalysis, with forward-rate coeff. = (. 
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Yn + AX, 
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Y n 
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2Y n 
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^ L m,n 

Y +Y 
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hydrolysis 
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ribozymic synthesis of long chains 
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